Spark dayname function implementation#20825
Spark dayname function implementation#20825kazantsev-maksim wants to merge 22 commits intoapache:mainfrom
Conversation
|
|
||
| query T | ||
| SELECT dayname('2008-02-20'::DATE); | ||
| ---- |
There was a problem hiding this comment.
Should we also verify that the data type matches the spark data types (Itf8 / LargeUtf8 / Utf8View)
| fn spark_day_name(days: i32) -> String { | ||
| let weekday = Date32Type::to_naive_date_opt(days).unwrap().weekday(); | ||
| let display_name = get_display_name(weekday.num_days_from_monday()); | ||
| display_name.unwrap() |
There was a problem hiding this comment.
potential panic in unwrap ?
There was a problem hiding this comment.
I tried to improve it
| } | ||
|
|
||
| fn spark_day_name(days: i32) -> String { | ||
| let weekday = Date32Type::to_naive_date_opt(days).unwrap().weekday(); |
There was a problem hiding this comment.
here as well ? Could we guarantee that unwrap is never going to panic ?
There was a problem hiding this comment.
I tried to improve it
|
Test failures : |
|
Thanks for the review @coderfender. Could you please another see, when you have a time? |
coderfender
left a comment
There was a problem hiding this comment.
I think the changes are looking good @kazantsev-maksim . Left a couple of questions for further review . Thank you very much for your patience :)
|
|
||
| query T | ||
| SELECT dayname('2010-04-24'::TIMESTAMP); | ||
| ---- |
There was a problem hiding this comment.
Could we also tests to add actual timestamps instead of dates parsed as timestamps here ?
| DataType::Date32 | DataType::Timestamp(_, _) => spark_day_name_inner(array), | ||
| DataType::Utf8 | DataType::Utf8View | DataType::LargeUtf8 => { | ||
| let date_array = | ||
| cast_with_options(array, &DataType::Date32, &CastOptions::default())?; |
There was a problem hiding this comment.
What would happen in case of an invalid string / error here ? Could probably handle it here and throw an error if it is a malformed input ?
There was a problem hiding this comment.
Invalid strings will be replaced with null values.
| 6 => Some(String::from("Sun")), | ||
| _ => None, | ||
| } | ||
| } |
There was a problem hiding this comment.
I dont think we are handling timezone's support here ? Spark handles timezone in current implementation and this could produce wrong results
| TypeSignatureClass::Timestamp, | ||
| )]), | ||
| TypeSignature::Coercible(vec![Coercion::new_exact( | ||
| TypeSignatureClass::Native(logical_date()), |
There was a problem hiding this comment.
Spark's date java.util.date is Date32 on Rust side of things . Given that we are not handling Date64 in the match arm , do you still think we should have logical_date() as one of the signature supporting Date32 / Date64 ? Perhaps we could change to Date32 ?
| query T | ||
| SELECT dayname('2010-04-24'::TIMESTAMP); | ||
| ---- | ||
| Sat |
There was a problem hiding this comment.
We might need tests to cover various timezones (atleast one apart from UTC) :)
| ---- | ||
| NULL | ||
|
|
||
| query T |
There was a problem hiding this comment.
I tested this in Spark 4.1.1 and found that the empty string behavior depends on ANSI mode:
-- ANSI on (Spark 4 default):
spark-sql> SELECT dayname('');
[CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "DATE" ...
-- ANSI off:
spark-sql> SET spark.sql.ansi.enabled=false;
spark-sql> SELECT dayname('');
NULL
The current implementation returns NULL, which matches the non-ANSI behavior. Since Spark 4 defaults to ANSI mode, it might be worth supporting both behaviors here — maybe through an enable_ansi_mode flag similar to how mod/pmod handle it in the same crate. That way the caller (e.g. Comet or another Spark-compatible engine) can choose whether invalid string-to-date casts should error or return NULL, matching whichever ANSI mode the user has configured.
Which issue does this PR close?
N/A
Rationale for this change
Add new spark function: https://spark.apache.org/docs/latest/api/sql/index.html#dayname
What changes are included in this PR?
Are these changes tested?
Yes, tests added as part of this PR.
Are there any user-facing changes?
No, these are new function.